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Abstract 

Statistical modeling of experimental physical laws is based on the probability density function 
of measured variables. It is expressed by experimental data via a kernel estimator. The kernel 
is determined objectively by the scattering of data during calibration of experimental setup. A 
physical law, which relates measured variables, is optimally extracted from experimental data by 
the conditional average estimator. It is derived directly from the kernel estimator and corresponds 
to a general nonparametric regression. The proposed method is demonstrated by the modeling of a 
return map of noisy chaotic data. In this example, the nonparametric regression is used to predict 
a future value of chaotic time series from the present one. The mean predictor error is used in 
the definition of predictor quality, while the redundancy is expressed by the mean square distance 
between data points. Both statistics are used in a new definition of predictor cost function. From 
the minimum of the predictor cost function, a proper number of data in the model is estimated. 
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I. INTRODUCTION 



A basic task of physical description of natural phenomena is to express relations between 
experimental data about measured variables in terms of physical laws Since the corre- 
sponding analytical modeling essentially depends on the intuition of the explorer performing 
it, an ambiguity surrounds this basic task and there thus arises a question how this could be 
avoided. This problem becomes of fundamental practical importance when developing intel- 
ligent electronic systems for automatic modeling of physical laws [2j. The ambiguity could 
be avoided if a unique objective method of modeling was found that would take into account 
common properties of experimental observations and of transitions from experimental data 
to models. The aim of this article is to show how such a method could be developed from 
basic principles of probability and statistics, as well as to demonstrate an example of its 
applicability. 

A common property of all experimental explorations is that each experiment corresponds 
to a process proceeding from preparation to execution. If we want a selected experiment 
to yield any information about the phenomenon under observation, then the result of the 
experiment may not be determined in advance i.e. several outcomes of the experiment must 
be possible. The next common property is repeatability of experiments. Consequently, a 
correct presentation of experimental observations requires the use of a distribution of ex- 
perimental results and this must be related to the concept of probability. The probability 
distribution is, therefore, a common basis for the description of natural properties in terms 

n 

of experimental data [3J, while the transition from experimental data to an analytical ex- 
pression of the corresponding probability distribution function is the crucial problem of 
modeling. An objective solution of this problem represents statistical modeling of the prob- 
ability distribution function by a nonparametric kernel estimator if the kernel is determined 
by a calibration of the experimental setup . For this purpose, the central theorem of 

probability theory and the maximum entropy principle provide a quite general route to the 
specification of the kernel function of the estimator. In this case, an experimental physical 
law, which represents a relation between observed variables, can also be generally expressed 
by applying the theory of optimal statistical estimators. The resulting nonparametric re- 
gression is the conditional average (CA), which can be automatically extracted from the 
probability density function (PDF) of experimental data in a measurement system. The 
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complete approach to modeling thus appears objective, independent of the intuition of the 
observer and, consequently, generally applicable for automatic execution. Due to these con- 
venient properties, CA is widely applicable in various fields of natural and technical sciences 

a. 

A nonparametric expression of the PDF by the kernel estimator has already been proposed 
by Parzen [?], Q] , but weaknesses of his proposal are that the kernel function is arbitrarily 
introduced, and that there is an assumption that its width should decrease to zero when the 
number of data is increased to infinity. In order to avoid this weakness, we specify the kernel 
unction objectively by the scattering of the measurement system output during calibration 
3, Q] • The only ambiguity in the expression of the PDF is then related to the number of 
experimental data, which according to Parzen's assumption should not be limited. Since an 
infinite number of experiments cannot be performed, there arises a fundamental question: 
"How many experiments is it reasonable to perform in order to explore the phenomenon 
properly by a given experimental setup?" Intuitively, we can conclude that it is reasonable to 
repeat experiments for as long as they bring new information. However, with an increasing 
number of experiments, the acquired data points become ever more concentrated in the 
sample space and consequently the repetition of the experiments becomes redundant. This 
is observed when distances between data points become comparable to the width of the kernel 
unction. This reasoning led recently to a specification of an information cost function C 



4], [5, [sl, [lo|, llll, ll2|, Il3| . For this purpose the indeterminacy of measurements was first 



expressed in terms of information entropy, which further led to definition of the experimental 
information I and the redundancy R of experiments. Using these statistics, the information 
cost function was expressed by the difference C = R — I. From the position of its minimum, 
a proper number of experiments can then be objectively determined 

Estimation of the information cost function is related to the calculation of integrals, which 
is inconvenient in a multivariate case. Therefore, another statistic, with similar properties 
but more simple calculation, is sought. Since it has been shown previously that the pre- 
dictor quality exhibits similar properties to the experimental information, we utilize it here 
in the definition of the predictor cost function. From its minimum, a proper number of 
experiments can also be estimated. If this is used as a proper number for the adaptation of 
the nonparametric regression to data provided by experiments, the modeling of the corre- 
sponding physical law can be performed automatically on a data acquisition system of the 



3 



experimental setup. To demonstrate this possibility, we first briefly describe the nonpara- 
metric regression and then turn to the definition of the predictor quality, redundancy and 
cost function. Properties of all statistics are subsequently demonstrated in the modeling of 
a return map corresponding to a noisy chaotic process. 



II. FUNDAMENTALS OF NONPARAMETRIC MODELING 



A. Description of kernel function 

Let us consider a phenomenon that can be described by just two joint variables, since the 
generalization to a multivariate case is straightforward. A single result of joint measurement 
is represented by the couple z = (x,y). We next assume that the phenomenon can be 
characterized statistically by repetition of measurements yielding sample points z n = (x n , y n ) 
in the joint span of a two channel instrument S z = S x <g> S y . 

Since the instruments are generally subject to stochastic disturbances, the results of 
measurements are scattered even during repetition of calibration [9]. The scattering can be 
described by the data provided by a series of repeated simultaneous calibrations of both 
instrument channels. For this purpose, we have to perform a joint measurement on an 
object representing two physical units u x and u y which we denote together by the joint unit 

aration is characterized by the 



When the interaction 



u = (u x , u y ). The scattering of instrument outputs during cali 
joint PDF ip(z\u), which we call the scattering function (SF) 2], A, 9]. 
between both channels is negligible, the SF is given by the product i/;(z\u) = ip(x\u x )ip(y\u y ). 
Without loss of generality, we further consider a case with equal channels which are subject 
to mutually independent random disturbances that do not depend on u. In such cases, 
the central limit theorem of probability theory, as well as the maximum entropy principle, 
suggest that we express the SF of a particular channel by the Gaussian function: 

1 



(x — u ^ n 



, ex P (!) 

'2-KG L * a 

The parameters u x , a represent the mean value and standard deviation of signal x at the cal- 
ibration and can be statistically estimated from given data. The joint SF is then determined 
by the product ip(z — u) = g(x — u x , a) g(y — u y , a). 

When reporting experimental results, experimentalists most often only specify mean val- 
ues and standard deviations of variables during calibration. The maximum entropy principle 



tells us that, in such cases, the Gaussian function is the best choice for SF [2j, [9|. 



B. Nonparametric estimation of PDF pertaining to experimental data 

When we perform a single measurement, we get a sample Z\ = (xi,yi) that represents 
the mean value of z during measurement and, therefore, we express the PDF as ip(z — zi) = 
ip(x — Xi)ip(y — yi). When we repeat the measurements N times, we get a set of samples 
{zj, 1 < i < N}, by which we model the joint PDF by the statistical average: 

1 - 

= n E^( z - Z ') ( 2 ) 

i=i 

that represents the kernel estimator. 

Properties of the particular components x, y are described by the marginal PDFs 
f(x), f(y). They are obtained from the joint PDF by integration with respect to one com- 
ponent, for example: 

r l N 

/(*) = / f(z)dy= -^(x-x,). (3) 

J S v i=l 

For modeling natural laws, the most important is the conditional PDF of the variable y at 
a given value of x, defined as: 



/(*) " TZLM*-*i) 



C. Estimation of a physical law 

Distributions of joint experimental data, for example that shown in Fig.[T], often resemble 
a ridge along some hypothetical line y Q {x), which we want to extract from the given data 
in an optimal way. For this purpose, we select from a set of joint data only those that 
pertain to some selected x. These joint data generally exhibit various values of y which we 
try to represent by a single value called the predictor of the variable y from a given value x. 
We consider as an optimal predictor of the hypothetical y Q the value y v at which the mean 
square prediction error is minimal: 

E l(y P -y) 2 \ x ] = mm (2/ P )- (5) 
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FIG. 1: The joint PDF /(z) utilized to demonstrate the properties of the conditional average 
estimator. 

Here E[. . . \x] denotes the operation of statistical averaging at given condition x. The mini- 
mum satisfies the equation: dE[(y p — y) 2 \x]/dy p = that yields as the optimal predictor y p 
the conditional average: 

y p {x) = E[y\x] = / yf(y\x)dy (6) 

J Sy 

By using Eq. (jlj), we obtain for the conditional average the expansion: 

y P [x) = * N .. r =2^ViBi{x). (7) 

22 j= in x - x j^) i= i 

The coefficients of this expansion are sample values y^ while the basis functions are 

*(*> = J [X ~ X " a) , (8) 

2^3=xW\X-Xj,(T) 

and satisfy the following conditions: 

JV 

^2Bi(x) = l , 0<Bi(x)<l. (9) 

i=l 

The basis functions B^x) can be interpreted as a normalized measure of similarity between 
the given value of x and its sample value Xi. At a given x, the sample value y m contributes 
most to the estimated value y p (x) whose complementary sample value x m is most similar to 
x. 

The calculation of y p (x) corresponds to an associative recall of memorized items, which 
is a property of an intelligence. Therefore, the estimator y p (x) could be treated as a basis 
for the development of a machine intelligence based on modeling of natural laws. The 



conditional average given in Eq.0 in fact corresponds to a normalized radial basis function 
neural network which is equivalent to a multilayer perceptron - the basic paradigm used in 
the theory of artificial neural networks 0,0. 

III. CHARACTERISTICS OF THE MODEL 
A. Predictor quality 

A predictor maps the stochastic variable x to a new stochastic variable y p that gener- 
ally differs from the variable y. When the variables x, y are related by some hypothetical 
physical law y (x) and the measurement noise is small, the first and second statistical mo- 
ments E[y — y p ], E[(y — y p ) 2 ] of the prediction error are also small. The second moment 
is: E[(y - y p ) 2 ] = Var(y) + Vai(y p ) - 2Cov(y,y p ) + [m(y) - m(y P )] 2 , where E, m, Var, Cov 
denote statistical average, mean value, variance and covariance respectively. In the case of 
statistically independent variables y and y p with equal mean values, the last two terms are 
zero and we get: E[(y — y p ) 2 ] = Var(y) + Yax{y p ). With respect to this relation, we define 
the predictor quality relatively by the formula 

H(y-y P ) 2 ] 



Var(y) + Var(?/p) 
2Cov(y,y p ) [m(y) - m(y p )] 2 



Var(y) + Vai(y p ) Var(y) + Var(y p ) 

The quality is 1 if the prediction is exact: y p = y, while it is if y and y p are statistically 
independent and have equal mean values. The quality Q may be negative if m(y) ^ m(y p ). 
For the predictor defined by the conditional average y p (x) = J y f(y\x) dy, we analytically 
obtain the equalities: m(y) = m(y p ) and Cov(y,y p ) = Var(y p ), which yield 

Q= 2Var ^^ (11) 

From the definition of the conditional average, it follows < Va.i(y p ) < Vai(y) and therefore 
< Q < 1. This inequality need not be fulfilled exactly if CA is statistically estimated from 
a finite number of samples. 

With an increasing N, we generally expect that the CA statistically estimated by Eq. 
increasingly better represents the governing physical law and, consequently, that the corre- 
sponding predictor quality Q on average increases to a certain limit value. As mentioned 
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previously, an unlimited increase in the number of experiments is experimentally impossible 
and, consequently, there arises the question how to determine a proper number N Q of data 
that will yield a judicious estimation of the governing law. 



B. Redundancy and predictor cost function 

To answer the last question, we have analyzed various experimental cases which have 
shown us that, with an increasing number of experimental samples, the value of predictor 
quality generally stabilizes when the distance between data points becomes similar to the 
width cr of the scattering function. Therefore, it is not reasonable to surpass significantly 
the corresponding number of data. This can be achieved if a ratio of o and a proper 
measure of distance 5 between neighbor data points is considered. For this purpose, we 
introduce 5 over the mean value of minimum square distance between data points: 5 2 = 
E[min{(xj — Xj) 2 + — Hj) 2 )}; i = 1 . . . N,j = 1 . . . N, ], and define a measure of redundancy 
of data by the relative variable: 

2 

R = 2N^ (12) 

Since 5 2 is comprised of two terms denoting contributions from x and y components, a 
factor 2 is utilized in the nominator. The fraction 2a 2 /S 2 represents an average increase of 
redundancy that is assigned to the acquisition of a new data point. In order to take into 
account acquisition of N data points, factor N is further used. With respect to this, we 
introduce the predictor cost function by the sum: 

C = R-Q+l 

= 2N- + E[{V - Vp)2] (13) 
5 2 Var(y) + Var(y p ) ' 1 ; 

The constant 1 is inserted in the first row in order to obtain a more simple expression 



in the second row of 



5q.[T3l In the same way as the definition of the information cost 



function given in jl], [f5], the cost function is here expressed in a relative form comprised 
of two terms: the first corresponds to the redundancy of experiments due to inaccurate 
measurements while the second represents the influence of acquisition of information about 
the phenomenon by experiments. With an increasing number of samples N, the redundancy 
on average increases while the second term decreases with the decreasing error. Therefore, 
the cost function C exhibits a minimum at some N Q that represents a proper number of data 
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needed for the modeling of the physical law governing the phenomenon explored. However, 
the influence of the first term becomes prevailing when the distance between data points 5 
becomes essentially smaller than the width a of the scattering function. 

IV. EXAMPLE 

To demonstrate the properties of the CA estimator, we utilize the data generated by a 
noise-corrupted chaotic return map with the span S x = (0, 1). This example is used because 
similar cases often appear in the analysis of chaotic time series . The basic problem 

in such an analysis is to extract the return map from a given record of time series that 
is influenced by additive noise of instrumental origin. In our case, we apply analytically 
determined data to provide for a comparison between the original and extracted physical 
law and to make feasible an objective reproduction of the complete method. The basic 
governing law is here given by the logistic map: 

X„ + i = 3.8 X n(l-Xn), (14) 

while the initial value Xi is arbitrary selected from the interval (0, 1) using a random gener- 
ator. To the values of generated chaotic series, the Gaussian noise v of zero mean value and 
standard deviation a = 0.1 is added to simulate an additive noise of measurement. The iter- 
ative solution of Eq.dHthen yields a series of noise corrupted chaotic values: x n = Xn + v n . 
Figure [2] shows two records of such a series that were used in modeling and testing of the 
proposed method. 

From the series {x n ; n = 1 . . .}, the joint samples of the basic variables x, y are obtained 
by treating the successive value of x n as the dependent variable: y n = x n+ i. The generator 
of data is thus analytically described by the rule: 

•En ~X.n L'n 

y n = x n+1 , (15) 

while the governing law is given by y = 3.8 x(l— x). The sample points {x n , y n ; n = 1 . . . N} 
are distributed along the corresponding parabola in the sample space. According to our 
previous treatment, the standard deviation a corresponds to the width of the instrument 
scattering function ip. The joint PDF shown in Fig. [1] is determined by the kernel estimator 
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FIG. 3: Testing of the CA predictor. Graphs represent the governing law y Q and basic data y - 
(top two: • • • ; * * *), test yt and predicted data y p - (middle two: + + + ; o o o ), and prediction 
error E r = y p — y t - (bottom: 00*0)- The upper two parabolas are displaced successively by 0.35 
in the vertical direction for better visualization. 

Eq. (jSJ) using 200 data, while a reduced set of 30 data is further utilized to demonstrate 
the properties of the conditional average estimator. The data obtained from the pure chaos 
generator are shown by y Q ■ ■■ in the top parabola of Fig. [21 while the basic noise-corrupted 
data y * ** are shown by points scattered around pure data points. 

The conditional average estimator is obtained by inserting data from the basic data 
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MEAN SQUARE PREDICTOR ERROR 




5 10 15 20 25 30 



N 

FIG. 4: Mean square prediction error E[(y — y p ) 2 ] as a function of the data number N. 

set into Eq. (j7j). To demonstrate its performance, we additionally generated with different 
seeds of random generators a set of N t = 60 test data {xt,i,Vt,i}- Based on data xt,% from 
this set, the corresponding values of y p ^ are predicted by the CA estimator. The test and 
predicted data are shown in Fig.[3] by the middle two sets of points (+ + + and o o o). 
The prediction error Er — y p — y t , calculated from both data sets, is presented by O'OC* 
at the bottom of Fig.[3l Relatively small differences between predicted and test points 
indicate that the properties of the governing law y (x) are properly modeled by the CA 
estimator. To confirm this qualitative conclusion, we next analyze the properties of statistics 
E[(y e — y t ) 2 ],Q,S 2 , R,C depending on the the number of data N used in modeling. The 
number of test data is kept constant Nt = 60 during calculation of these statistics. Properties 
of the statistical model of the governing law depend on sets of samples utilized in modeling 
and testing. To demonstrate this dependence, we repeated the modeling and testing three 
times using various statistical sample sets. 

The mean square predictor error E[(y — y p ) 2 ] is presented in Fig.H] versus number of 
samples N. Its value varies statistically but, on average, it decreases with the increasing 
number N. Statistical fluctuations are largest at small N and significantly depend on samples 
used in modeling. However, with the increasing N, the statistical fluctuations are ever less 
pronounced. If the number of test samples N t is much larger than the number of samples N, 
changing the testing sample set does not significantly influence the properties of estimated 
statistics, which is the case in our demonstration. This is the reason why we use the value 
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PREDICTOR QUALITY 
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FIG. 5: Predictor quality Q as a function of the data number N. 

N t = 60. 

The predictor quality Q, as determined from the prediction error, is presented in Fig. [5] 
versus number of samples N. For each data set the statistical fluctuations decrease with 
increasing N so that qualities calculated from different data sets converge to the same limit 
value. With increasing N, the curves determined from different data sets merge approxi- 
mately at N ~ 11. The quality is there ~ 0.97 and rises to ~ 0.98 at iV = 30. At iV ~ 11, 
the difference between the curves obtained from different data sets is about two orders 
of magnitude smaller than the corresponding quality. With respect to these properties, we 
could conjecture that in the present case about 11 data values already provide for a judicious 
modeling of the governing law y (x) by the CA predictor. 

To confirm our last conjecture, we turn to the determination of the predictor cost function. 
For this purpose, let us first analyze the properties of the mean square distance between data 
points 5 2 . The corresponding graph, shown in Fig.EJ indicates that S 2 is rather monotonously 
decreasing with the number of samples with the approximate dependence being ~ 1/N. 
Consequently, the corresponding redundancy R is increasing with N similarly as ~ N 2 . 
This conclusion is confirmed by the graph in Fig. [7J 

Following the definition given by Eq.[13l we obtain from the estimated error and the 
redundancy the predictor cost function C shown in Fig. [HI Its minimum is not very pro- 
nounced. From various statistical data sets, we obtain the estimates of the minimal value 
C Q = 0.033±0.006. The corresponding number N Q = 10±2 confirms our previous conjecture 
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MSD BETWEEN DATA POINTS 
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FIG. 6: Mean square distance between data points S 2 as a function of the data number N. 



REDUNDANCY 
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FIG. 7: Redundancy R as a function of the data number TV. 

stemming from the analysis of predictor quality. 

With an increasing number of samples N, the quality Q(N) of the CA predictor exhibits 
a convergence to some limit value Qoo that characterizes hypothetical maximum quality of 
proposed nonparametric statistical modeling. This limit value generally increases with the 
decreasing scattering width a. Related to this, the minimal value of cost function is dimin- 
ished and takes place at a larger N ; for instance at a = 0.005 we get C Q = 0.018 ± 0.003 
and N = 14 ±3. However, the limit value of the quality Qoo is less than 1 if 1/a and N are 
finite. This means that it is not possible to exactly determine the governing physical law 
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PREDICTOR COST FUNCTION 
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FIG. 8: Predictor cost function C as a function of the data number N. 
y = y Q (x) from joint data obtained by an instrument influenced by stochastic disturbances. 



V. DISCUSSION 



Our method of estimation of natural laws from given data can be simply generalized to 
multivariate cases by substituting corresponding vectors for the variables x, y. Such modeling 



ras already been applied in a variety of examples stemming from physical 15|, technical 
2), [l6], economic 0, and medical environments 0, 0, 13] • Particularly in economic 
and medical environments, phenomena are often characterized by many variables that could 
be either informative or disturbing. Due to the complexity of such cases, there usually 
exists little or no information about a possible function that could describe the governing 
law. In relation to this, researchers are faced with the problem of how to define complexity 
and to reduce it by extracting informative variables from a given set [19| . Alongside mutual 
information, the predictor quality could also be applied for this purpose. For instance, it has 
been recently shown in the field of medicine how an analysis of predictor quality can provide 
for an ordering of variables and the extraction of a set that yields an optimal predictor of 

nn 

the disease healing process [171. 118|. Such an analysis makes feasible further progress towards 
the origins of the treated disease. 

The value of the proper number N Q , as defined by the minimum of predictor cost function, 
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could be interpreted as a measure of the complexity of an adequate predictor model. It is 
important that this measure depends only on the accuracy of observation and properties of 
the phenomenon represented by given experimental data. 

In relation to the example demonstrated here, there emerges an important conclusion 
about the description of natural phenomena by physical laws in the form y = y a (x). As 
long as such a law is considered as the only basis for the description of the phenomenon, 
it is not sufficient for a complete description, since no information is provided about the 
properties of the sample space of joint data. Consider a well known example - the law 
m = pV that relates the mass m, the volume V and the density p of an object. This 
law does not include the restriction m > 0, and is in this aspect not complete. Similar, 
but much more complex, examples are met when treating chaotic phenomena and their 
strange attractors [141 ] . For example, the law applied here is a special case of the law 
Xn+i — a Xn(l — Xn), with a being a constant. Depending on the value of a and the 
starting value \ii the series {x n ', n = 1 • • •} exhibits at large values of parameter n — > 
oo either a discrete or a continuous sample space. Moreover, in the continuous case, the 
sample space can be comprised of disconnected intervals which could hardly be predicted 
analytically. Similar, but still more cumbersome, is the situation if we consider chaotic 
processes with continuous parameters. Consequently, a governing law y = y (x) appears 
incomplete for description of the phenomenon. The most outstanding deficiency is that it 
does not include information about the structure of the sample space corresponding to the 
observed phenomenon. This deficiency does not appear if we consider as a basis for modeling 
the probability density function and estimate it nonparametrically, directly from measured 
joint data. The extraction of a law that describes a relation between variables can then be 
generally performed by using the conditional average estimator. However, applications of 
simple parametrical laws, like m = pV, are of tremendous importance for analytical sciences 
and we do not expect that the proposed nonparametric models could substitute for them, 
although they are convenient for direct applications. Consequently, the question arises of 
how to find a univocal link between both paradigms of modeling. 
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VI. CONCLUSIONS 



Our approach indicates that the objectively introduced kernel estimator provides for a 
nonparametric statistical modeling of a quantitatively explored phenomenon. Since no a 
priori information about the form of the governing physical law is required, the modeling 
can be automatically performed by a computer in a measurement system. The proposed 
predictor cost function C provides for estimating the proper number N of data needed for 



the mode 
function 



ing. Properties of the predictor cost function resemble those of information cost 
§[ ((J, but its estimation is much more simple. The properties of the extracted 
model of the governing law can be quantitatively described by the predictor quality Q and 
redundancy R of data from which the governing law is extracted. This law represents 
the distribution of the variable y at a given value x by a single predicted value y p (x). 
Such a compressed representation generally corresponds to creation of information about 
the explored phenomenon This is in contrast to the loss of information caused by 

stochastic disturbances in signal transmission channels j^oj]. If the extraction of information 



from observations is considered as a basis of natural intelligence [21|, |22j, then a system 
capable of estimating a physical law from measured data autonomously must be treated as 
an intelligent unit. Such an interpretation provides a common basis for a unified treatment 



of experimental sciences and natural or artificial intelligence 



22|]. 
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